Sentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences
نویسندگان
چکیده
The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in historical classics. The method selects the sentence pairs with the same phrases at the beginning or the end of the sentence or with the same time phrases as anchor sentence pairs, which are employed to divide the paragraph into several sections. Then, the sentences in each section are aligned using dynamic programming algorithm according to the entropy calculated by maximum entropy model. The maximum entropy model employs improved Chinese co-occurrence character feature, length feature and sentence alignment mode feature. The Chinese cooccurrence characters feature is improved by giving different weights to characters in different position based on the contribution to align sentences. In the experiment performed on ShiJi, the precision and recall of the proposed method reaches 95.9% and 95.6% respectively, which outperforms other sentence alignment methods significantly.
منابع مشابه
Chinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences
This paper, which builds on previous studies on sentence alignment, introduces a sentence alignment method in which some sentences are used as “anchors” and a two step procedure is applied. In the first step, some lexical information such as proper names, technical terms, numbers and punctuation marks, location information and length information are used to generate anchor sentences that satisf...
متن کاملTrimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches
Sentence compression is a task of creating a short grammatical sentence by removing extraneous words or phrases from an original sentence while preserving its meaning. Existing methods learn statistics on trimming context-free grammar (CFG) rules. However, these methods sometimes eliminate the original meaning by incorrectly removing important parts of sentences, because trimming probabilities ...
متن کاملComparison of Alignment Templates and Maximum Entropy Models for NLP
ich warde gerne von KOln nach MUnchen fahren In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language...
متن کاملUtterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy
This paper proposes a new approach to segmentation of utterances into sentences using a new linguistic model based upon Maximum-entropy-weighted Bidirectional N-grams. The usual N-gram algorithm searches for sentence boundaries in a text from left to right only. Thus a candidate sentence boundary in the text is evaluated mainly with respect to its left context, without fully considering its rig...
متن کاملSentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm
We present an e cient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor nding methods. The accuracy of nding cognates for Hungarian-English language pair is e...
متن کامل